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I. INTRODUCTION 


A. BACKGROUND 


With the rapid advances in computer technologies both 


hardware and software, computer systems are becoming more 
complex. Together with its wide application, increasing 
system reliability is of vital importance. (Ref. 1] 
Peoviding fault tolerance is one of the attractive 
solutions. 


Specific reasons for the use of fault tolerance exist in 
several areas of present applications. Examples are: 
" 1. Safety related failures, such as medical support or 
defense systems. 

2. Failure and short outage causes economic penalties, 
such as telephone , banking , and time sharing 
Systems. 

3. Environments where manual maintenance is not possible, 


Such as space, sea or undersea. [Ref. 2] 


In general fault tolerance consist of three sequential 
Sep so . 

1. Detection 

2. Diagnosis 

3. Recovery 

The detected error is analyzed i. isolate the fault 
eause. Recovering from the failure requires at least a 
back-up system such as secondary memory, a Spare processor 
and system's buses in real time systems. AS a consequence, 
computing machines become larger and more complex. During 
normal fault free operation, fault tolerance does not 
provide any performance advantages, but it is the insurance 
of the logic machine against disruptive physical events. 
Peet.) 3 | 
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This thesis is intended to focus on the area of imple- 
menting a serial busS communication subsystem as a secondary 
bus structure of the real time multitransputer system _ to 


provide for the fault tolerance requirement. 


C. MOTIVATION 


To explain the motivation of this thesis, it 18 bebees 
to give some quantitative information about the computing 
capacity and the performance of the transputer, IMS T424 
[Ret .a4a)e 


TABLE l 
HARDWARE CAPABILITY OF THE IMS T424 


DEG@CES SO tna 32 bits (4.9 * 10° Gmrecew 


processing speed... 10 MIPS (950 nanosec: Mule 


memory “Capacass. sane 32 bit address bus 4 GBytes 
built Imnamemory ) eee 4 KBytes RAM. 80 MBytes/sec 
SErlalS Susi awe ea 4 INMOS links 1.5 Mbytes/sec 
parallel pus. G° x. “ae 25 Mbytes/sec max. transfer 
peripheral interface . 8 bits bidir. 4 Mbytes/sec. 
DOWe Etc SsS 1p 4 eon ee O29) Watts 


eile SC 45 mm* chip mounted in an 84 
contact leadless chip carrier. 





At the first glance the quantities in Table 1 seem very 
attractive just to use the transputer as ae substitute 


processor in an ordinary uniprocessor system. 


lez 


The important hardware feature of the transputer is the 
four bidirectional serial communication channels. Therefore 
it is possible to obtain multiple communication paths 
between two elements of the multitransputer system, using 
some structures of the transputers. These paths provide the 
graceful degradation for redundant multitransputer systems. 
In other words, we can simply by-pass the failed element in 
the multisystem, and processing continues with other 
elements of the system. 

Another important feature of the transputer, is that it 
provides excellent hardware for concurrent processing. In 
other words the transputer has been designed using a reduced 
instruction set architecture which implements the OCCAM 


Goncurrent programming language efficiently [Ref. 5]. 


Wem LHESIS STRUCTURE 


The introduction just presented is designed to provide 
the reader with a brief look at fault tolerance and in 
Pometecular to the development decisions on which a serial 
bus communication structure based. 

Chapter II will describe the hardware architecture and 
the capabilities of the multitransputer systems. Chapter III 
“ume provide a brief explanation of the OCCAM concurrent 
processing language and its features. Chapter IV will 
outline the fault tolerant system structure and details of 
the serial bus communication process. In Chapter V_ the 
performance of the serial bus communication is evaluated and 
compared with fault free system performance. 

The final chapter presents conclusions and observations 
that resulted from this thesis effort and Suggestions for 
further research. Eight appendices are also provided that 
give detailed descriptions of the subprocesses of the serial 


bus communication program and its implementation. 


is 


II. HARDWARE FEATURES 


A. WHAT IS THE TRANSPULER ss. 


Transputer, or a transistor computer, is a Single Gime 
computer which provides a direct implementation for the 
process model of computing, in which each process is an 
independent computation with its own data and program. The 
processeS are executed in a time Shared mode on the 
transputer and special instructions are provided to suppoims 
the process model of communication. 

The term "Transputer"” also reflects the device's ability 
to be used as a system building block. The word is derived 
from ‘transistor’ and ‘computer’ Since fhe trvancuut oie 
both a computer on a chip and a silicon component like a 
Erans PS bon. Just as the use of logic gates and Boolean 
Algebra provides the design methodology of the present elec- 
tronic systems, so the transputer together with formal rules 
of OCCAM provides the design methodology for future concur- 
rent systems. 


The detailed descriptions of the components of the 


transputer: processor, memory, links and peripheral inter- 
face Vale be introduced in next four subsection 
respectively. 

in Roce s son 


IMS T424 transputer has a 32 bit processor with an 
instruction execution rate of 10 MIPS. Typical instructiem. 
carried out by the processor and their execution times are 
given in Table 2 

The processor is optimized to implement the OCCAM 


programming language. It is designed for performance and 
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IMS T424 INSTRUCTION LIST AND EXECUTION TIMES 
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efficiency, which is achieved by designing the instruction 
Set to simplify instruction decoding, by having a minimum 
number of special registers, by incorporating useful func- 
tions into the registers and by the use of an evaluation 
Stack. This approach simplifies compiler design, since all 
the operands are in an uniformly addressed data space and it 
also gives a fast process switch time. 

The instruction set 1S compact. This is achieved by 
Bepdratineg data access from manipulation. Most instructions 
are one byte long and are divided into two four bit fields: 
Puneclon and operand. A sequence of bytes can be used to 


extend the instruction in units of four bits up to a word 


eS 


length, enabling both functions and operands to be frequency 
encoded. 

The processor is designed to execute high level 
languages efficiently and will normally be programmed using 


OCCAM or an industry standard languages such as C or PASCAL. 
a. Concurrent Processing 


The processor executes programs sequentially. It 
implements parallel processes by sharing its time between 
the set of processes which are active at any instant. A 
process is active when it is not waiting £Eor Inputs 
GUE pUE. 

The currently executed process runs until it has 
to wait for communication. When this happens, the process is 
Set inactive and the next process on the active queue Starts 
to execute. When a communication channel becomes ready, the 
message is passed, and the waiting process is linked to the 
end of the active process queue. Current process’ then 


continues to execute, whenever its turn in the queue comes 


pe 
Db. Priority Processes 


The transputer T424 supports two levels of 
Pronger. hiphewwand low” prlor meiece PRIPAR (priotaji 
parallel) process may have two components. A queue of active 
processes is maintained for each priority level. A priority 
1 (low priority) process is executed whenever there are no 
active priority 0 (highy priority peroeescec: 

If there are no active priority O processes, the 
latency (that is, the time from an external channel becoming 
ready, to the start of its first instruction of the relévame 
waiting priority 0 process) is typically 600 ns. (maximum 
2600 3asee Otherwise, if a priority O process is already 
executing, the relevant waiting process is linked to the end 


of the priority O queue. 
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en Performance 


The size of a program iS given by the sum of 
mie Sizes of itS program elements. All timing averages and 
the maximum time to execute the program is given by the sum 
of the times to execute the individual program elements. 

If the program is held in the external memory, 
the external program fetch time must be added to obtain the 
program execution time. Because of the instruction lookahaed 
and the overlap with internal memory, this overhead will 
usually be small. 

If data is held in external memory, the external 
data access time must also be added to obtain the program 
execution time. The processor shares memory cycles with its 
input/output interfaces. Each concurrent access by an inter- 
face channel delays the processor by an average of 30 ns. 
The maximum reduction in performance is 104 ; under typical 


conditions the reduction is negligible. 


2. Memory 


4 Kbytes built in static memory provides maximum 
data transfer rate of 80 Mbytes/sec. The memory interface is 
a 32 bit multiplexed data and address bus. It extends the 
internal address capability to atotal of 4 GBytes in a 
Single linear address space. 

The non-multiplexed cycle provides timing signals to 
drive industry standard RAM's) and ROM's. The multiplexed 
Sele provides timing signals for RAS’ and CAS* and control 
for an external address multiplexer. The interface can also 


provide CAS before RAS refresh cycle. 


1RAS : Row Address Strobe 
2CAS : Column Address Strobe 


ey, 


An asynchronous wait input is provided so that the 


memory timing can also be determined externally if required. 





Figure 2.1 Memory Interface Driving Static RAM's 


Higher performance can be obtained uSing static 
memory. The logic diagram of typical connections to static 
memory iS shown on Figure 2.1 . The cycle to access that 
memory does not require the phases for address multiplexing 
and so is completed in 150 nano seconds giving data rate of 


25 Mbytes/second maximum. 
See Lamics 


The IMS T424 has four standard INMOS links providing 
high speed intercommunication between transputer products, 
enabling rich variety of networks to be constructed. Each 
link operates independently and provides a memory to memory 


block message transfer capability between transputers. 
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Each link implements two OCCAM channels. The link 
Pemsist of am output and input, both of which are used to 
eameey data link and link control information. A message is 
transmitted as a sequence of bytes. 

After Transmitting a data byte, the sending 
transputer waits until an acknowledge has been received, 
Signifying that the receiving transputer is ready to receive 
another byte. 

The receiving transputer can transmit an acknowledge 
as soon aS it starts to receive a data byte so that trans- 
mission normally continues. Thais’ asynenronous “protocol 
guarantees reliable transmission in spite of possible delays 
in either the sending or receiving transputer. 

; During transmission of a message , both sending and 
receiving processes will be set inactive, and they will only 
be linked to the end of their respective active queues after 
the final byte has been acknowledged. 

The data rate on each link can be programmed, uSing 
link set configuration channel [Ref. 4]. The highest 
frequency is 20 Mbits/sec. giving a maximum data rate of 1.8 


MBytes/s. on a channel. 
4. Peripheral Interface 


The peripheral interface provides access to industry 
standard devices such as eight bit parallel controllers for 
auxiliary memory. The interface controller provides a block 
message transfer capability between memory and the periph- 
eral interface. 

The peripheral interface is an 8 bit bidirectional 
bus which may be used to input and output sequences of 
bytes. There are two control lines which may be uSed to 
address external devices, and an "Event" input to provide an 


Inaiberrupt capability. 


ae 


The interface is accessed via four standard output 
channels and four standard input channels. All eight chan- 
nels use the same 8 bit path and transfer handshake, with 
the processor initiating the transfer. The transfers are 
Synchronized to a separate external clock, which need not 
have any fixed relationship with the transputer input clock. 
Asynchronous operation is also permitted, but at a lower 
speed than for synchronous operation. 


Externally addressable devices may be connected via 


the peripheral interface. For example, by using one output 
channel as the address channel, another as’ the write data 
channel, and one input channel as the read data channel. 


Both addresses and data may be arbitrarily long sequences of 
bytes. 

The 4 Mbytes/sec. data rate provided by the inter- 
face allows the connection of high performance peripheral 
chips, without the need for FIFO s or DMA controllers 

The "Event" input may be used to communicate with 
Waiting processes, and hence cause it to be scheduled. This 
provides an input functionally similar to an interrupt /eeee 
manner consistent with the process model of the transputer. 
The typical latency for this interrupt is 600 nanosec. The 
"Event'' input can also be used to enable the peripheral 
interface to respond to being accessed from a standard 


microprocessor bus. 


B.| THE TRANSEUGER SwStEMs CONCEE SE 


In the past, system performance has increased regularly 
by a factor of ten each decade, Figure 2.2 . This improve- 
ment has been achieved by advances in circuit technology and 
by increasingly complex systems. For the future, VLSI offers 
the potential of much greater circuit complexity but only 


modest increases in circuit performance. 
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Figure 2.2 Advances in Technology and System Throughputs 


The economics of uniprocessor systems are based on the 
historical perspective that processing iS expensive in 
comparison with the memory. This has led to the Von Neuman 
bottleneck where a Single processor iS connected to vast 
amounts of memory. The economics of the VLSI are different. 
Today , a single wafer of silicon can contain 2 Mbytes of 
memory or 256 conventional microprocessors. 

To exploit this potential, it will be necessary to build 
systems with a much higher degree of concurrency than is 
currently possible. The transputer is designed aS a 
programmable component to implement such systems. 

In their proposal to achieve intelligent interaction 
between people and computers, The Japanese have projected 
the need for computers with one thousand times the perform- 
ance of present day systems. These will only be possible 
uSing concurrency, and transputer has been designed to make 


such fifth generation systems a possibility. 
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Pipelines and arrays of transputers can be used to 
provide greatly ineréased performance by —explote ime see 
concurrency inherent in many applications. Two examples 
which require high performance are Signal processing and 
database searching. Networks of transputers can provide the 


performance needed for both applications. 


Steal phoOeces samen such as the fast fourier transform 
algorithm, maps easly onto a pipeline. The pipeline can 
accept the input samples at up to 100 KHz., which more than 


covers the full audio spectrum. A 64 point FFT requires six 
transputers in the pipeline, a 256 point FFT requires eight 
and 1024 point FFT requires ten transputers. A pair of pipe- 
lines, interlinked at each Stage, is able to accept input 
samples at up to 200 KHz. Higher frequencies can be handled 
by uSing more tranSputers in parallel. [Ref. 6] 

A pipeline or an array can also be uSed to do Searching. 
Provided that the search requests can diffuse through the 
network, and the answers converge, the Shape of the network 
does not matter - it can even contain faulty devices. The 
full internal memory of each transputer can be Searched 1000 
times per second. With external memory attached to each 
transputer, the search rate is slower, but 64 kbytes per 
transputer can be searched at least 30 times per Second. 

Other applications, such as image processing, finite 
element analysis, matrix manipulation, telephone switching 
Systems , fault tolerant systems and artificial intelligence 
naturally lend themselves’ to arrays or networks of 
transputers. For example the Array structure has been 
proposed for fault tolerant multi transputer systems , as 


Showhe Pile Sule wcue 
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TIL sor tyeake CAPABILITIES 


OCCAM iS a new programming language. It is designed to 
weeeant concurrent applications in which many parts of a 
system operate Separately and interact. OCCAM is relevant to 
many present day applications, particularly those involving 
microprocessors and real time applications. OCCAM will be 
important for future applications involving the interaction 
pieeinany thousands of computing components. 

The novelty of OCCAM is in its treatment of concurrency. 
OCCAM enables the programmer to express a program in terms 
Sie) Concurrent processes which communicate by sending 
messages through communication channels. This has two impor- 
tant consequences. Firstly, it gives the program a clear and 
a Simple structure as the individual processeS operate 
largely independently. Secondly, it allows the program to 
exploit the performance of many computing components, as 
each concurrent process may be executed by an individual 
processor. 

OCCAM can capture the hierarchical structure of a system 
by allowing an interconnected set of processes to be 
regarded from the outside as a single process. At any level 
of detail, the programmer is only concerned with a small and 


manageable set of processes. 


wee CCAM PROCESSES 


A process performs a sequence of actions and then termi- 
mates. Each action may be an assignment, an input or an 
mitieplit action. An assignment changes the value of a vari- 
able, an input receives a value from a channel and an output 


Sends a value to a channel. 


Zz 


At any time between start and termination, a process may 
be ready and waiting to communicate on one or more of its 
channels. Communication is synchronous wy When both anpae 
process and an output process are ready to communicate on 
the same channel, the value to be transmitted is copied from 
the output process to input process. The input and output 
processes then continue. 

Each channel provides a one-way connection between two 
concurrent processes: one of the processes may write to the 
channel, and the other may read from it. 

A process may be ready and waiting to input from any one 
of a number of channels. In this case, the input is read 
from the first channel which is used for output by another 
process. 

OCCAM may be used to program a network of computers. 
Each computer with local store executes a process with local 
variables, and each connection between two computers imple- 
ments a channel between two processes. 

OCCAM may be used to program an individual computer. The 
computer shares its time between the concurrent processes, 
and the channels are implemented by values transmitted in 
the main memory. Indeed, a program deSigned for a network 
of connected computers may also be executed unchanged by a 


Single computer. 


Bo SERIE ries 


There are three primitive processes from which all other 


processes are constructed, as mentioned in the previous 
paragraph. 
These three primitive processes given in Table 3 , can 


be combined sequentially or concurrently to create more 
complex processes, and thus they form the building blocks 


fOread sp Toeiame 


24 


Peele 3 
OCCAM PRIMITIVES AND THEIR SYNTAX 


BRIMITIVES Oo YNTAX 


INPUT channel ? variable 
OUTPUT channel ! variable 
ASSIGNMENT variable := expression 





aie, inp 


An input process reads a value from the channel into 
avariable. The '?' symbol denotes the input process. 

tats Primttaves:eads a value from the specified 
channel. Gene rovrTdes tmessyNenmoni zation wiltn a .concurnment 
meoeess, which outputs a synchronizing signal on the same 
channel. 

An input sets the value of a variable to a value 
Mmeeut from a channel. The input waits until an output using 
the same channel is executed in parallel with the input. 

A multiple input is equivalent to a sequence of 
Separate input processes for each variable in turn, in left 
Hemright order. Each input is separately synchronized with 
an output process being executed in parallel. Each variable 
Meeebpewwa simple variable, or a word or byte subscripted 


peement of a vector of variables. 


Pee Oui DUE 


An output process writes the value of the expression 


Bemene channel. The symbol '!° denotes the output primitive. 
AE eOuUEpUE WalessumNtil <an input using the same 
channel is executed. It then outputs the value of the 


13, 


expression to the channel and terminates. A multiple output 
is equivalent to a Sequence of outputs, which writes the 
value of each expression in turn, in left to right order: 


Each output is separately synchronized with an input process 


executed in parallel. 


3. Assignment 


An asSignment process transfers the value of its 
expression to the named variable. 

The expression is evaluated and the variable is set 
to the resulting value. The assignment process then termi- 
nates. The variable may be a simple variable or an element 
of a vector of variables selected using either byte or word 


Subs Grp elon. 


Creo RUG TURES 


l. Sequential 


In many applications it 1S necessary to do a number 
of steps one after another, the flow diagram of this struc- 


ture is given in Figure 3.1 


Figure 3.1 Flow Diagram of the Sequential Construct 


26 


A sequential process takes the form of the keyword 
SEQ followed by the component processes, each on a new line, 


all at an extra level of indentation, as in Table 4 


TABLE 4 
SEQUENTIAL CONSTRUCT IMPLEMENTATION 
SEQ 


pPEeecess ul 
process 2 


process n 





SEQ ensures that each component process terminates 
before the following component process is executed, and 
entire process will only terminate after the final component 
process has finished. 

SEQ iS an example of an OCCAM constructor. This 
construct (comprising the SEQ and its component processes, 
which taken as a whole), can be regarded as a_ single 


process. 
2. Parallel 


If we require many processes to be running aS a 
GCemcurrent system, we can construct a parallel process as 
seen in Figure 3.2 


The keyword PAR is followed by a number of component 


Processes, each starting on a new line and indented, as 
Slow in Table 5 . The effect is to execute all of the 
component processes together, which is achieved by sharing 


the processor time between the set of active processes. 
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The parallel construct terminates after all the 
component processes are terminated. If there is no component 
process the construct terminates immediately. 

Two component processes of a parallel construct may 
communicate by sending values using a channel. One contains 
outputs to the channel, and the other contains the inputs 
from the channel. The processes are said to be connected by 
the channel. No other component of the parallel construct 


may use the same channel. 


TABLE 5 
PARALLEL CONSTRUCT IMPLEMENTATION 


PAR 


proces ctl 





Variables are not used for communication between the 
component processes of a parallel construct. However, a 
variable may be used in two or more component processes, 
provided that no component process changes its value by 
input or assignment. 

ie al Gl Glan Gs@in ee GO nmmata i e parallel construct, OCCAM 
contains the prioritized parallel construct declared Was 
PRIPAR. A prioritized parallel construct gives each compo- 
nent process a different priority. The first component has 
the highest priority and the last component has the lowest 
oMebiolaelic ¥/ An implementation may restrict the number of 


components which a prioritized parallel construct can have. 
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Pep ute 3.2 Flow Diagram of the Parallel Construct 


A prioritized parallel construct ensures that a 
higher priority process always proceeds in preference toa 
lower priority one. The progress of a higher priority 
process is not affected by any lower priority one, except by 
communication on connecting channels. If Several concurrent 
processes at the same priority are able to proceed, each one 


1S given an opportunity to proceed in turn. 
3. Alternative 


Sometimes a process has a number of channels associ- 
ated with it and needs to perform one of a number of actions 


depending on which channel first sends it a message, Figure 
5a 


This 1S achieved using the alternative construct, 
which chooses just one of its inputs for execution. The 


keyword ALT is followed by a guarded processes, Table 6 


An alternative process waits until one of: guarded 


processes is ready to execute. One of the ready guarded 
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Figure 3.3 Flow Diagram of Alternative Construct 


TABLE 6 
ALTERNATIVE CONSTRUCT IMPLEMENTATION 


ALT 


guard-process l 
process 

guard-process 2 
process 





processes is then selected and executed. The construct then 
terminates. A guarded process starting with an input from a 
channel is ready if an output process iS waiting to output 
to the channel. If the guarded process is selected, the 
component process iS executed. If a guard contains an 


expression followed by an input or wait, the guarded process 
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is ready only if both the value of the expression is TRUE 
and input or wait is ready. If a guarded process is itself 
euealiternative construct , then it is ready if one or more 
component guarded processes of the alternative construct is 
ready. If more than one guarded process becomes ready at 
the same time, an arbitrary one is selected, this may occur 


if they contain inputs on the same channel. 


4. Repetitive 


The repetitive construct takes the form of the 
keyword WHILE followed by an expresSion, followed by a 


Single component process indented on the next line, Table /7. 


TABLE / 
REPEC Pye, CONSTRUCT. IMPLEMENTATION 


WHILE | 
expression 
process 





The component process iS executed repeatedly until 
the expression evaluates to FALSE, and the construct termi- 
nates. If the expression is initially FALSE, the process is 
not executed and the construct terminates immediately, 


Figure 3.4 


pee keplicator 


perepltecdtor 1s Used with a constructor Co replicate 
the component process a number of times, Table 8 . A repli- 
@aior Can be used with a parallel construct to construct an 


array of concurrent processes. It also can be used with the 


a 


TERMINATE 





Figure 3.4 Flow Diagram of the Repetitive Construct 


alternative construct for reading from an array of channels, 
and also can be used to provide a conventional loop with 
sequential construct. 

The replicator declares an identifier to be the 
replicator index, giving its base value and a count of the 
number of replications required. Its effect is to form a 
sequential, parallel, alternative or conditional construct 
containing count components by replicating the component 
process, substituting Successive integer values for the 
replicator index (starting at base). The substituted value 
for the replicator index inthe last component will be 


(base =awecount)) - 


The replicator index can be used in expressions but 
not constant expressions, it may not be changed by assign- 
ment or input. An implementation may restrict the values of 
base and count to be constants, particularly when a repli- 
cator is used to a form a parallel construct. If a coum 
evaluates less than zero or equal to zero, then an empty 
construct 1S generated. This has the effect of termination 


for sequential, parallel and conditional constructs, and the 
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TABLE <8 
REPELCALTORS RLOCEos lMEEEMENTATION 


SEQ i = [ base FOR count ] 
process 





Figure 3.5 Flow Diagram of Replicator Construct 


effect of never being ready to execute for alternative 


processes. The flow diagram of this construct is shown on 
Figure 3.5 
6. Conditional 


A conditional construct takes the form of an expres- 
Sion followed by a process, and it is able to execute if the 
expression evaluates to TRUE. The syntax form is shown on 
Eaple 9 . AeCOndMELOld | sConscermuictlon Jtakes the formoof IF 
followed by component conditionals which is able to execute 


if one of its component conditionals is able to execute. 
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TABEE 9 
CONDITIONAL CONSTRUCT IMPLEMENTATION 


iF : 
expression 
Poe See 
NOT expression 
process 


The conditional process executes the first component 
(textually) which is able to execute, and then terminates. 
If there 1S no component able to execute, then the construct 


terminates with no other effect. At most one component is 


executed, Figure 3.6 





Figure 3.6 Flow Diagram of Condit tonagimGonstruce 
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D. CONFIGURATION 


Configuration is used to meet speed and response 
requirements by distributing programs over Separate, inter- 
connected computers, and by Elsen wand Peter bi Zine 


processes on Single computers. 

Every computer has local store and a set of numbered 
ports. A physical connection between two computers connects 
a port on one computer to a port on the other computer. This 
implements up to two channels between the computers, one in 
each direction. 

A parallel construct may be configured for a network of 
computers. Each computer executeS a component process, and 
port allocations are used to allocate channels to ports. 


A parallel construct may be configured for an individual 


computer. The computer shares its time between the compo- 
nent processes, and the channelS are implemented by values 
in store. Indeed a parallel construct configured for a 


network may be reconfigured for an individual computer. 

ae core econsthuct can pe uSeCd Eo provide priorileazed 
component processes, and an alternative construct can be 
used to provide the prioritized input primitives. 

The allocation of processing resources to the concurrent 
processes in a program does not affect the logical behaviour 
of the program. Simple implementations may omit or l1gnore 


some or all the configuration facilities. 
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Vee LS LEM oe RUC nies 


A. FAULT TOLERANT Svs Feu 
1. Objectives 


A designer may have multiple objectives for a redun- 
dant system. However, central to the theme of fault toler- 


ance are those of, 
a,  AVailabaeie., 


The average probability that a System iS fumes 


tilond lee any ee ene ime 
b. “Error Rate 


The average rate at which a system's output 


makes a transition to an unacceptable state. 
Gr ai al riley, 


It is described as a mean time to failure for a 
System. The average length of time a system retains some 


operational utility without external maintenance. 
d. Dependability 


It 1S a statement of system availability through 
a specified period of time and is a function of availability 


and reliability. 
e. Maintainability 


The average period required to return the system 
to operational status or in some cases to the original 


state. 
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f 2S Growth 


ine explvettyrecoenition of a System to facili- 


tate realization of future changes. 
2. Fundamental Aspects 


There are three kind of failures for a system which 
Beeauce a anomalous condition or function. 

Limited "normal" environmental failures that causes 
components to weaken or break over for a long period of 
time. 

Abnormal environmental failures, disrupting compo- 
nents in a short time or propagating faults introduced from 
outside of the system. 

Man made, hardware or software design flaws, man- 
machine interaction/interface disruptions or deliberate 
meaeasion {Ref. 7]. 


Malfunetions are anomalous conditions resulting from 


threats. These include physical malfunctions (component 
failure), signal or logic level malfunctions (faults), data 
level malfunctions (errors), control breakdown and system 


Meet malfunctions (catastrophic failures). 
Fault classification consist of three dimensions: 
a. Duration 
Transient, intermittent, pseudo transient, 


persistent and permanent. 


b. Extent 


Bocas Vs. distributed 


c. Value 
Determinant or indeterminant. 
Tenpoerary bal lubessmace the ones most difficult to 


detect in real time systems. 


oy 


A transient failure 1S a nonrecurring temponaue 
failure usually caused by fleeting phenomena such as radia- 
tion, noise and power fluctuations. 


An intermittent failure 1s a recurring failure tien 


reappears ona regular basis, often caused by critical 
tolerances in timing or electrical signal levels. Ti @¥ana 
intermittent failure is present ina circuit, it  Mayeee 


'active’' at one instant or ‘inactive’ at another. 
Fault tolerant systems resist threats and Surmount 
maliunce Lons = in. “Var hous yvagse Active redundant systems 


attempt to observe malfunctions and then substitute elements 


(hardware or software) as required, with a possible inter- 
ruption in Service. Passive redundant systems, often using 
more resources than active systems, form the correct output 


based on a consensus of actively redundant elements, with no 
need to observe the malfunction. Hybrid redundancy combines 
advantages of active and passive systems by observation and 
Substitution. Self purging systems take advantage of the 


Same techniques through obServation and deletion of all 


failing elements on line. In active redundant systems, 
Substitute elementS require testing, else they may be 
useless when needed. In passive systems, consensus mecha- 


nisms are a prime candidate for latent faults. 


3. Techniques 


The key to successful application of protective 
redundancy iS a systematic and balanced selection of the 


baie e = eens 


- Hardware (structural) : additional components 
- Software (functional) : special programs 
- Temporal ; Operation repetition 
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an Hardware 


Methods of introducing hardware redundancy may 
be divided into two categories, on the basis of terminal 
activity of the modules. 

Static : This method is also known as "masking", 
Since components are employed to mask effects of hardware 
failures. Three forms of static redundancy have been used in 
Beactice; component replication for individual electronic 
components, triple modular redundancy (TMR) with voting for 
logic circuits or larger modules of a computer, and quad- 
ruple modular redundancy, where four processing elements are 
used in place of one element. The four elements are paired 
in two elementS per board. The results of each processing 
step are compared within a board. If the results are found 
not to compare, the board is declared “faulty”, and Les 
results are not permitted to propagate in the system. The 
board in which the results compare is continuing the opera- 
tion alone until the faulty board is replaced and normal 
@eeration resumed. 

Dynamic: In this hardware redundancy approach, 
fault caused errors are allowed to appear at the terminals 
of a module. Fault tolerance is then implemented by two 
consecutive actions. First, the presence of a fault 1s 
detected and then a recovery action either eliminates the 
fault or corrects the error which was caused. Redundancy 
Within the operational system is therefore introduced ina 
selective rather than massive fashion. 

Application of dynamic redundancy requires that 
a number of design choices be made in the functional design 
stage. 

Ger Modtimar1 Zattoo. The designer must define 
a modular architecture with emphasis ona minimal number of 


connections while trading off desired partitioning. 
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(2) Fault Detection. The objective is to 


select real time, concurrent fault detection methods. These 
methods are error codes, status signals, duplicated opera- 
tions, internal monitoring of critical events, completiam 


Signals, watchdog timers, reasonable checks and totally self 


checkine Cirevnesc: 


(3) Recovery Actions. Upon detection of the 
fault, a recovery procedure must take place. If a program 
restart fails to correct the error, a permanent faults 
assumed. 

Reconfiguration is a case of replacement 
in which if a spare is not available, then the processing 
capability of the machine is decreased, and a decision must 
be reached concerning: which programs require a lesser 
degree of survivability, when graceful degradation of 
processing capacity iS acceptable, and a minimum processing 


capacity is desired prior to entering a safe shutdown mode. 
(4) Inter-module Communication Choice. This 
is a major tradeoff in dynamic systems. Alternatives are bus 


communication and direct module to module paths. 


b. Software Redundancy 
Software redundancy includes all additiomam 
programs, program segments which may not be included in a 
computer with fault free hardware. Major forms of software 


redundancy are. 

- Multiple storage Gi critical, programs sanageara 

- N-version programming 

- test and diagnosis programs 

- Fault tolerant features 

- recovery mechanism 

Compared to hardware redundancy, one advantage 

of software redundancy 1s the ability to superimpose faut 


tolerance on “off the shelf items". Another advantage is 


40 


ease of modification and refinement. The main disadvantage 
Moemeaiiticulty of assurimg that software will be able to 


mumecminon GoOrrect ly after “a fault occurance or that it will 


be invoked sufficiently early ie prevent system 
contamination. 
c. Time Redundancy 


Poise ror Glenreaunaames, ScOolsist of repeating or 
acknowledging machine operation at various levels: 
Microoperations, single instructions, program segments or 
entire programs. Usually the distinct goals are fault detec- 
tion by means of repeated execution and recovery by restart 
or operation reentries. 

A common use of time redundancy is found in 
identification and correction of errors caused by transient 
faults, and in program restarts after a hardware reconfigu- 
faa adsOT) . This is accomplished by repetition of single 
instructions, program segments or entire programs. 

All these methods may be conveniently grouped 
according to time ‘of their application with respect to 
normal system operation. 

(1) Initial Testing. Which takes place prior 
to the normal use and serves to identify elements containing 
imperfections introduced during production. 

(2) Concurrent Detection. Which takes place 
Simultaneously with normal operations. This 1s implemented 
by variety of error detecting codes. 

(3) Scheduled Detection. Which takes’ place 
when the normal operation is temporarily interrupted to test 
feomeetaults and may be similar to initial testing with the 
main difference being limited time and a "self test" 


approach. 
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(4) Redundancy Testing. Which serves Eo 
verify that various forms of redundancy are themselves fault 


free and ready to act. 


B. MULTIPROGCESS ING Beever er 


Multiprocessing systems are being, and will be used in a 


large number of applications such as control of electric 


power generation, distribution, and conSumption, nuclear 
power processing facilities, Safeguarding and control, 
healthcare delivery in hospitals and medical centers, 
climate control, Security, waste disposal and fire protec- 


tion in large buildings, and largely in defense systems. 

Why are multiprocessor systems useful in all these 
applications? The reason is several; They usually make it 
eaSier for the user to access the system, they generally 
provide increased performance through resource sharing, and 
they often increase the availability of a system. A network 
of microprocessors can quite often duplicate the capability 
of one large expensive system at lower cost. Multrproceceaa 
systems can provide adaptability and rapid reconfiguration 
with the system functioning at different times as a very 
large and complex problem solver or as a network of smaller 
machines each dedicated to a unique task, or as something 
between. They can usually also provide increased reliability 
Since the total system can continue to operate despite indi- 
vidual processor failures, albeit with reduced capabilities, 
provided that some of the links between the processors 
remain intact. Also, since redundancy can be achieved at a 
lower cost uSing processors distributed over a large area, 
the survivability of the system, particularly in militias 
applications can be increased. Furthermore, a distributed 
processing system can provide increased, distributed power 


and responsiveness because it can be closely tailored to the 
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aeplication. Addit leona anu MEL processor systems can be 
provided as needed, to ensure proper response time. 

Multiprocessor systems can also be designed to be cost 
effective when applied to a wide variety of applications, 
where the number of processors can be determined by the 
distributed processing requirements. A properly designed 
distributed processing system threatened by overload can be 
incrementally expanded by simply adding more processors. 

The disadvantages may or may not outweigh the advan- 
tages, depending on the system-unique requirements. On the 
minus side, the designer may be faced with increased soft- 
Ware complexity. Application software may be more costly to 
develop for a distributed than centralized system. in 
contrast to a single central processor based system with 
only one executive, a distributed system typically requires 
each processor to contain itsS own, individual executive that 
must be capable of communicating with all the other execu- 
tives in the total system. This, in turn, will require that 


each individual executive provides a task handling capa- 


bility where task resident in various processors can 
communicate with each other, and, in case of local software 
or hardware errors, diagnostic capabilities exist to 
iMeerlize bugs’. This is not to say that diagnostic or error 


checking software is not needed or used in large central- 
ized, single processor systems; however, the diagnostic 
software development for a distributed systems is usually 
Meee difficult and costly. 

A distributed processing system, by definition, is also 
more dependent on communication technology, particularly 
where the computers are widely dispersed and the _ peak 
traffic demands between the computers are high. 

Finally, the design and development of a unique distrib- 
uted processors system may require expertise both Te 


hardware and software areas. The advantages and 
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disadvantages of the distributed multiprocessing systems are 
given in the Table 10. [Ref. 8] 


TABLE 10 
ADVANTAGES AND DISADVANTAGES OF MULTIPROCESSING 
SYSTEMS 


ADVANTAGES DISADVANTAGES 


Increased reliability Increased software 
Increased Survivability Difficult system testing 
Increased processing power Mone communes Garon 
Increased modularity Unique expertise needed 
System expandability 





C. DESIGN METHODOLOGY 


The purpose of this impleméntation 1S to estaba ie 
multiprocessing system which provides fault tolerance using 
a new VLSI product, the T424 transputer microchip. Before 
the design, the following techniques and methods are assumed 
for the system. 

_ Structural hardware redundancy 
- Functional software redundancy 


- Operation time redundancy 


Using more computing elements will provide for both 
multiprocessing capacity and hardware redundancy. The number 
of computing elements inthe multisystem is chosen as 
sixteen. The purpose to choose that number for the proto- 
type system is briefly described in the following sections. 

Functional software redundancy and operation time redun- 
dancy is provided by a fault tolerant operating system 


design, which is explained later in this chapter. 
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waoedescraibed in Chapter 1, the hardware provides four 
communication channels to uSe in a system configuration, 


therefore possible system constructions may be listed as 


following. 
a, pepe lines/ Rings ot myceure 
This structure consist of each computing element 
connected to each other with two channels, which also 


provides redundancy for communication channels between two 


computing elements, as in Figure 4.1 





Figure 4.1 Pipelined, hinge. ot ructure 


peeeletraponal —3-) Construction 


In this structure each computing element is 
connected to three elements and they build a new computing 
group which still has four available communication channels 
for other computing group connections, Figure 4.2 

This structure 1S one of the basic structures 
that can be found in many kinds higher level of structures 


also. For example, the matrix structure in Figure 4.4 can 
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use the tetragonal structure as a computational element 


which is connected the four neighboring elements. 


oe 


Figure 4.2 Tetragonal 3-D Construction 


Ch biutiber  nyaeGomar mire item 


This 1S a Special implementation of a pipeline 
Structure, which is a good solution for the fast fourwen 


transformation or Similar engineering applications, Figure 


4.3 


Gi "ict +1 x Construct tom 


This Structure consist of the connections Gipenme 
computing elements to each neighboring element in two dimen- 
Sions. There is no further channel redundancy between two 
computing element but it provides very large number of 
communication rings and multiple communication paths. This 
Structure will be described later in this chapter in detail. 
Basic structure of this type is given in Figure 4.4 

The matrix structure has been chosen for the 


proposed prototype multiprocessing system. The basic reason 
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Ao 
RC oo He 


Figure 4.3 Butterfly sGenstruction 





Figure 4.4 Mat ri xoerueture 
is that, providing multiple communication paths gives many 
application orientation possibilities. Many communication 


rings can be implemented in this kind of system, or pipeline 


type of processing also can be applied. 
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The number of processors 1s determined by the 


number of rows and columns contained in the matrix. To 
provide symmetrical structure, the Square matrix 1S a solu- 
tion. Hence the probable number of computing elements would 


be 4, 9, 16, 25% and Wsomen- 

The number 16 is chosen to build an intermediate 
prototype multiprocessor. The result of previously 
discussed design assumptions lead us to the following 


multitransputer structure in Figure 4.5 





Figure 4.5 The Structure of the Multitransputer System 
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Pee kOPOSED OPERATING SYSTEM 


In the previous section the multitransputer systems have 
been described, and in this section the operating system to 
be used in the prototype multitransputer system will be 
described. This operating system includes’ both multipro- 
cessing and fault tolerance features. 

Basically the proposed system includes three main parts: 
Fault Tolerance Controller, Sequencer and the Link 
Controllers. These operating system processes work to 
control the uSer processes to provide the above mentioned 
features of the multitransputer system. The prototype oper- 


ating system structure and its internal connections are 


shown on Figure 4.6 


FAT LIA e L leas 
TO [ RANCH SE QUE MCT R 
CONTROLLER CONTROLLER 

USER 





Figure 4.6 Prototype Operating System Structure 


alee Faudbe Tolerance Controller 


That subsystem of the operating system includes 
self check programs, diagnosis programs, watchdog timers, 
hardware fault tolerance interfaces if provided and others. 


These processes will be activated by means of fault 
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detection or run time control.The entire fault tolleranee 
process holds the first priority level among all other 
subsystems and processes. 

The fault tolerance system has not been designed 
in sehtsmthe's isos but there will be a high level structural 
design, a description of its role in the operating system 
and some suggestions in this section. | 

The main purpose of this system is the diag- 
nosing and recovering from the errors, which are found both 
in other systems or in self check processes of the fault 
tolerance system. 

The system inputs come from both link controller 
subsystem, Sequencer subsystem and the fault tolerance 
system timers or hardware interfaces. The diagnosis or 
recovery processes must be designed to satisfy the basic 
objectives of the fault tolerance which are reliability, 
maintainability and dependability. The system outputs 
consist of sequencer and link controller outputs, which are 


used to reconfigure the multiprocessing system and system 


operation. 
b. Sequencer 
This subsystem of the operating system provides 
the multiprocessing system organization. In other words je 


determines which particular transputers execute which user 
processes. Most probably inputs will be both fault toler- 
ance inputs (hardware reconfiguration for particu 
transputers ) and link -eontroller Grnpuece The link 
COnERed ler provides the communication between other 
transputers Sequencer systems. Therefore the system reco- 
nfigurations will be known to the sequencer subsystem. 
Probable processes of this system will be some 
communication protocols for other sequencer "subsystems and 
organization of the processes, such as cancelling some oper- 


ations or restartimeesormers. 
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ee SoOmmMunvedtlemecont roller 


tier Communaeatton secontroller subsystem will be 
explained next in this chapter in detail. The main task of 
this subsystem is to provide inter and intra communication 
between processes, systems and transputers. It will also 
provide a fault detection feature during communications, 


uSing watchdog timers and acknowledging techniques. 


Eee COMMUNICATION SUBSYSTEM DESIGN 
1. Design Objectives 


As described in the previous paragraph, this 
subsystem handles the inter and intra communications between 
processes. These processes could be either uSer processes, 
Sequencer processes, fault tolerance processes or, if we 
have, other processes. 

“Before we explain the design details, it is better 
to give some conceptual ideas about this system. Inthe 
multisystem, Figure 4.5 , transputers are connected to each 
neighbor in two dimensions, therefore we may use delay 
insertion loop or token passing type of communication proto- 
eels [| Ref. 8]. In this design some features of both types 
of communication has been used to obtain maximum efficiency 
Smee redundancy for failures. 

A second feature of the communication protocol is to 
provide error detection, which is achieved uSing both 
acknowledging and watchdog timer techniques. 

To explain the communication subsystem design, the 
Simple process structure of that system will be explained 
first. As seen in Figure 4.7 , the communication process 
accepts inputs from both hardware channels (links) and soft- 
ware channels. These inputs cause the activation: of the 


proper communication process. Execution result will be 


al 


another output from either hardware channels (links) or from 
software channels. This procedure provides the communica- 


tion between the sender process and the receiver process. 


COsealn [CAT TON 


SueSYSTE 


FROM IMMER PROCESSES TO INNCA PROCESSES 





Figure 4.7 Communication System I/O Block Presentation 


The communication subsystem has two input groups and 


CWO SCUEPUE ser ouUp.s- therefore there will be four different 
COMMUTING Eon sEyiIne Ss). These are simply described as, from 
outer transputer to outer transputer (by-pass), from outer 
transputer to inner process (internal distribution), from 


inner process to outer transputer (external distribution) 
and from inner process to inner process (short-cut). 

In this communication protocol the token has beew 
used to determine receiver process in the system. The token 
1s ‘the leading byte of the message, which includes’ the 
message type, recelver transputer mumber and receiver 
channel number information. This token 1S produced by the 
communication system itself and also is used by same the 
system to determine the communication type. The token has 
been named as CODE for implementation. The CODE is a sixteen 


bit word two's complement integer in the range -32768 to 
32767 


a 


After a brief explanation of the subsystem, the 


design requirements can be listed as follows. 


my Listening towall Jinks and user channels 
b) Determination of the communications type 
Seo ransmission tmroigh the preper channel or link 
The communication system design is shown in Figure 


4.8 , Which allows us to achieve the previous design 


requirements. 


MMT ITER IMTERE ACE LISTENER 





Figure 4.8 Detailed Structure of Communication Subsystem 


a. Recelver 


Pes tensmeten foumenardware links and receives both 
CODE and the following DATA, if 1t exist, and activates the 
decoding process. In case of receiving errors, the fault 


tolerance system is activated. 


a3 


b. Decoder 


Determines if the communication type 1570 
by-pass or an internal distribution, and “2f (tests 
internal type, the number of the inner channel is 
determined. 

Cc. S sender 


Sends the activation signal or data taken from 


both the encoder or decoder to the appropriate process. 
d. Listener 


Listens to all user channels and © reeer,7coe ae 


data or activation message. 
e. Encoder 


It is activated by the listening process_~ and 
determines the CODE according to the multiprocessing config- 
uration provided by the sequencer subsystem. Both transmit 
or send channels are activated dependant “on whether aaa. 


COMmMUNnLeatio0on type Us distribut tom or shore cur. 
f. Interface 


Determines the logical path to send the message 
to the desired transputer. It uses the hardware configura- 


tion which is provided by the fault tolerance subsystem. 
gs Transm Geen 


Transmits the messages through links to other 
transputers using acknowledgements. Uses the link number 
determined by its interface process and activates the fault 


tolerance system in case of transmission problems. 
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h. Duplexer 


Closes the receiver during transmission to 


prevent the system from false communication. 
2. Implementation of the Communication System 


Before starting to explain the details of the imple- 
mentation a brief explanation will be given about CODE (or 
Beceem) used in this system. 

As described before, CODE consist of a two's comple- 
ment binary sixteen bit word, which can take on the maximum 
Peettive decimal value of the 32/67. The first digit of the 
five digit code represent the type of code. In other words, 
Ems digit Shows whether there is following data or not. If 
this digit 1S zero, that means there is no data following 
mae code. If it is 1,2 or 3 data will be assumed to follow. 

The next two digits are used to show the process 
number in case of data not exist. Therefore we can uSe 99 
operating system processes within this system. If the infor- 
mation Sequence includes data in that case process number 
wili be determined by first 3 digit, in range of 100 - 32/7, 
or we can use 227 uSer processes within this system. 

The last two digits show the transputer number to 
receive that information Sequence. Therefore possible 
transputer number in this system can be 6/7 maximum. These 


values and their meanings are shown on Table ll 


a. Receiver 


The OCCAM program of this subprocess iS given in 


the Appendix A. This subprocess has five inputs and one 
paeoult. All channels uSed in this process are bidirectional 
to provide an acknowledgement procedure. The processing of 


this subprocess can be separated into three portions. 


aE 


AD Eee 
CODE AND THE MEANINGS OF THE DIGITS 


max.code value 


without data PP : pEgeess no 
: als DU begs 


x eel Ze oem 
with data reno + (x= ica 
E: Eranspucen- 





The first process is to WAIT for an external 
message event. This is achieved using alternative structure. 
All guards are inputs from the hardware links. When an input 
event occurs, the variable ACTIVE.LINK 1s assigned to the 
corresponding link number. This variable is used to make a 
handshake through the same channel. Also it is used in the 
interface process to determine the shortest transmission 
link to reach the destination. 

The acknowledgement signal is sent through the 
active link in the second portion of this subprocess. “Alge 
the code type is determined to find out if data exist 
following the cope: If data exist, it iS received in the 
Same manner as CODE. When the receive process is completed, 
the decoder process iS activated and the receiver process 
waits for the synchronization Signal to prevent the system 
deadlocks. 

Another task of the receiver process can be 
named as “close receiver". The purpose of this process is to 
prevent the system from deadlock during the transmission 
process, because both the receiver andthe transmitter 


processes use the same hardware channels at the same time. 
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"Close receiver’ is achieved by the duplexer 


’ 


subprocess. This process sends the “close receiver" signal 
through RECEIVER.OFF channel. When this signal is received 
by receiver, the variable WAIT is assigned as TRUE. This 
asSignment causes waiting for the next synchronization 
Signal from the duplexer, which occurs at the end of trans- 
miSsion procedure. Wait function will be disabled, before 
the program starts to wait for the external input through 


the links. 
pee Decoder 


This subprocess has one input and two outputs. 
It is activated by the receiver process and the transputer 
number of the code is determined by looking at the last two 
digits. The next step is to decide if the communication 
type 1s internal or by-pass. According to the communication 
type either the sender process or the transmit interface 
process is activated. After each activation, the process 
waits for a synchronization signal from the activated 
process. The last step before the termination of the 
process is to send a synchronization signal to the receiver 
process. The OCCAM program implementation of this process 


1S given in Appendix-A. 
c. Sender 


This Subprocess is activated either by the 
decoder process or the encoder process. The only task of 
this process is to send the message to the specified user 
process (or operating system process). Before the termina- 
tion of the process, a synchronization signal is returned to 


the invoking process. 
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di. Listener 


This process consists of repetitive alternative 
constructs and guarded processes which are reading from the 
user and/or operating system processes. When the input is 
taken from any one of these channels, the encoder process is 
activated to continue the communication procedure. The 
listener process waits for acknowledgement from the encoder 


process to terminate itself. 
e. Encoder 


This subprocess iS activated by the listener 
Subprocess. The first procedure is to determine the 
receiving transputer number. This is achieved uSing the 
PROCESS.TABLE provided by the sequencer subsystem. 

The next procedure iS to determine the communi- 
cation type (either external distribution or short-cut). 
This 1S achieved by comparing the target transputer number 
with the number of the "own" transputer. 

in case (of Ses bot ete the send subprocess is 
activated to send the message to an internal process. Te 
the message is targeted to an external transputer'’s process, 
then a code word is computed for either with data or without 
data cases, andthe transmit interface subprocess is acti- 
vated to send that information through the proper hardware 
leiemikee 

The synchronization Signal 1S waited from time 
interface process before the termination. After that signal 
is received, the listener process is released by synchroni- 


zation Signal and process terminates. 
f. Interface 


This subprocess iS activated either from the 


decoder or the encoder depending on the communication type. 
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The main task of this subprocess is to determine the logical 
and the shortest link number to reach the destination. This 
1s achieved uSing distance tables and the hardware status 
table. The distance tables for the first, second and third 
priority levels are given in the Appendix C. The basic idea 
is to compare hardware status of the chosen link to reach 
the specified transputer and continue the comparison to find 
the available link. If the communication type is a by-pass, 
then the link number must be chosen differently from the 
metwvye link. In other words, we can not send the same 
message back through the same channel again, which causes 
system deadlock. 

After finding the proper link number, the trans- 
mitter subprocess is activated to transmit the message. The 
acknowledgement must be received before the synchronization 
Signal is sent to the encoder or the decoder subprocesses 


and the interface process terminates. 
ge. Transmitter 


This subprocess iS activated by the interface 
subprocess and after receiving the necessary parameters for 
transmission, the duplexer subprocess iS activated to termi- 
nate the receiver subprocess to prevent ourselves from a 
geaalock and faulty communication. When the duplexer 
Subprocess gives the "receiver is closed" signal, the 


transmit interface subprocess is released by the synchroni- 


Zation signal. The transmission procedure continues in the 
Same manner as the receiving process. After transmission is 
completed, the duplexer unit iS activated to release the 


receiver process, and transmit process terminates. 
h. Duplexer 


As mentioned before, this subprocess provides 


elosing and opening of the receiver during the transmission 


a 


cycle. First it waits for the event message from the trans= 
mitter to send a close message to the receiver. Thenmgae 
acknowledges the transmitter when the "receiver closed” 
message 1S received from the receiver subprocess. After that 
the duplexer waits to get "transmission completed" message 
from the transmitter process and sends the “open receiver” 
message to the receiver process before the termination of 


the process. 
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V. PERFORMANCE EVALUATION 


In this chapter the performance of the communication 
system will be evaluated. The first step is the calculation 
of the subprocesses execution times for different cases .For 
example, the execution time of the sender process will be 
calculated both for the decoder activation and the encoder 
activation separately. Tivseecaleu@ativon witdeatiow us to 
calculate the total execution times of the different types 


Gommunications. 


PweeesUBPROCESSES EXECUTION TIMES 


Table 2 is used to evaluate the execution: times of the 
processes. For a particular system application, IMS 1400 
Static RAM and IMS 3630 erasable ROM are assumed as external 
memories. Therefore, in order to calculate additional 
external program and data access times, the access times of 
these memories are used [Ref. 4]. 

As an example, the receiver process execution time will 


Pemealculated. The method that has been used to calculate 


the execution time is taken from [Ref. 6]. The calculation 
table includes every type of construct, evaluation and 
operand in the OCCAM. By inspection of the receiver program 
Mme rsure 5.1 , the calculation table can be filled as 
follows. 


The receive process contains the alternative construct 
Paes Guard processes, 5 conditional branches, 1 replica- 
tive construct and no parallel construct. 

During the execution there will be 2 parenthesis, ll 


constant, 13 variable and 10 vector variable evaluations. 


oul 


The receive process executes 5 input and 3 output primi- 
tives (maximum case). The number of arithmetic operations 


are 2 division, 2 comparison and 2 logic statements. 


PROC receive. (CHAN decode.advance, receive.off) = 
VAR acknowledge, wait 
EE. TRUE 


Q 
Walt := FALSE 
7M 
ee Peseta 
e,. link 
amie ? ext. 
active. link 
ea 3 ? ext. 


eae nicl aj? link 


ext. 
Ves ame 
ep 8 off ? 
wait := TRUE 
IF 
NOT wait 
SEQ | 
gape eae a ' ext.code 


linkjLactive.link]| ? acknowledge 
ext.code.type := FALSE 


Q 
Qu. 


OQ ee es Oee 
A. A. 


& 
Gio wo ro wo 
ri a. 
OD WO POW FO 


re 


acknowledge 


(ext. soc) LOOT) 0 
ext.code TRE 
— 5989 160B0) | <0 


leat at ke 
lank 


f ext.data 
? acknowledge 


active. imine 
active. link 


inktactive: {inky 2 excl iees 


TRUE 
Shr 


acknowledge 
SE 


ecode advance. aan 
decode.advance ? ANY 
NOT acknowled 
Pau ig La ea eee. trouble] ! active.link 





Figure 5.1 The Receiving Process 
These counted values provide the quantities for the 


caleulatrion rab lee Subtotals are found by multiplying the 


quantities with their execution times, and the Summation of 
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these subtotals gives us the total execution time of the 
recelver process. 

In the following section theSe calculations will be used 
to estimate the worst caSe communication performances. Some 
of these results may have larger values, but actual execu- 
tion times will not be greater than calculated execution 
times. 

The following thirteen tables show the instruction types 
of the processes and their execution time calculations. In 


meee ecCaletlartions Eme worst cases are assumed. 
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EeeeeoeRtTAL COMMUNICATION PERFORMANCE 


As described in Chapter IV, Enewe davewe four sat ferent 
types of communication. The execution times of these commu- 
nication types will be calculated separately in this 
section. These four communication types and their subpro- 


cesses are listed in Table 25 


Pa bisea2S 
COMMUNE CA LLON Me ES eAND ENVOKED PROCESSES 


eOMM. TYPE PROCESSING “SEQUENCE 


By-pass Receive-decoder-Interface- 
duplexer-transmitter 


Mmmeernal Dist. Receive-decoder-send 


External Dist. Listen-Encoder-interface 
duplexer-transmitter 


Short-cut Listen-Encoder-send 





1. By-Pass Communication Performance 


In this communication the invoked subprocesses are 
receive, decoder, interface (activated by decoder), trans- 
mitter and duplexer. Using Table 12, 13, 14, 17, 22 and 24, 
the total time for the by-pass process is found to be 128.5 


microseconds. 
2. Internal Distribution Performance 
In this type of communication the receive, decoder, 
send (activated by decoder) processes are executed. Using 


7/ 


Table 12, 16 and 18, the total execution time of Emus tye] 


of communication is found to be 60-6 microgecond. 


3. External Distributionmeertormanesc 
In this kind of communication, the invoked subpro- 
cesses are listen, encoder, interface (activated by 
encoder), transmit, duplexer and receiver closing. Using 


Table 13, 14 15, 21, 23 and 24, the total execution time of 
the external distribution type communication is found to be 


44723 microsecond. 
4. Short-Cut Procedure Performance 


Short-cut type of communication consist of listen, 
encoder and send (activated by encoder) processes. Using 
Tables 15, 19 and 20, the execution time of the short vege 
communication is found to be 404.8 microsecond. 

The last two types of communication’s execution 
times are found to be about 400 microseconds. The reason for 
this time is the large number of processes, which are moni- 
tored by the communication system. This execution times can 
be reduced to 50 - 100 microseconds by using a_ smaller 
number of user processes. Also operating system processes 


can be monitored separately than user procesSes. 


C. FAULT PRES SYS PEM PERT ORMAL Ge 


In a system with parallel bus and common memory , aS in 
Figure 5.2 , the block data transfer performance will be 


calculated in this secre lone 


That kind a operation requires bus check routines and 
the data transfer protocols. The bus checking procedure 
execution time can be neglected comparing to the data 


transfer time, therefore the problem is simplified. 
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PARALLEL bus 





Racure oo. 2 Fault Free System Structure 


1000 * 4 Bytes of data transfer requires 1000 memory 
access cycles. In this procedure there iS no communication 
or fault tolerance control procedures, therefore the minimum 
time to access these common and local memories, which are 
the same kind of static memories, is calculated using Table 


2 as follows, 


ilonttete~wmevalitatiom. = b200+ 1.7 <= 55° = 21355 
memory access = 100.0 
ROP Age : : ‘ ' : . lege) 


4 KBytes of information requires 313.5 microsecond 
transfer time, or approximately the data transfer rate for 
parallel bus structure 1S approximately 100 Megabits per 


second. 


Zo 


D.- SYSTEM WITHOUT SEeRep Erie 


In the system without shared memory (parallel bus 
failure mode) the common data base will be assumed in one of 
the resident transputer's memories. To transfer this data 
to other transputers requires at least a four stage opera- 
tion, where each stage has been denoted with stars in Figure 
So ee In each stage a regular communication protocol will 
be assumed. The special subroutines can be provided for this 
purpose, therefore the total data transfer rate will be 
dependent on the serial data transfer rate and communication 


execution times. 


Without fault tolerance controlling, the data transfer 
procedure execution time for 4 * 1000 bytes will be deter- 
mined by local memories access times and serial data 


transter rates. as. tollows - 


H/o primitives = £50.  hO0Gm. 3625 
serial transfer : 625 * 1000 

memory access > 100 * 1000 + 500 
TOTAL. «. « .«. = 776290 nano sea. 


Distribution of 4000 Bytes of data to all transputew 
will require the execution of the same procedure four times. 
Therefore total time consumed at data transfer will be 
3105.1 microseconds. This corresponds to 3.1 microseconds 
per byte or approximately 0.3 Megabytes per second data 
transfer rate. This evaluation is true, if there is no 
interference in the internal buses. 

If the interference is assumed on internal buses, in 
that case there will be delays at each stage of data 
transfer. These delays cause to reduction of the data 


transter rate 
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Figure 5.3 Common Data Transfer 


E. EXPANDABILITY 


In this section different computation models will be 
tried to analyze the performance of the multitransputer 
system, and to show the system expandability. 


Assume that the computation proceeds ina linear pipe- 


line fashion. Tiss model can be adapted into the 
multitransputer system in many different configurations. To 
provide Eriple “or quadruple modular redundancy, a 
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multitransputer system may include three or four computation 


models in itself. The basic reason is to provide simulta- 
neous throughputs, and by comparison it will be possible to 
find hardware faults, if comparisons fail. ~USing triplew@an 


quadruple redundancy, hardware failures will be detected and 
located, also correct resultS will be forwarded to the next 
Stage of calculation. | 

AS an example of triple modular redundancy, a linear 
type of computation model can be used. If we assume that 
each row of a multitransputer system corresponds to a linear 
type computation model, and the first three rows execute the 
Same program simultaneously. If one of the three results is 
different, this row's computing elements are assumed failed. 
This row's execution job is assigned to a spare row while 
the failed row is diagnosed. 

For quadruple redundancy, the system'S components are 
arranged into removable units in Such a fashion that on each 
removable unit two computing elements constantly produce 
resultss which are checked for consistency. If the two units 
generate inconsistent results, the removable unit is assumed 
to have failed and it will be removed from the system by 
Service personell. A spare unit is placed into the system 
while the other working pair of computational units has 
propagated correct results. To allow the spare unit to be 
Switched into the system without powering the system down is 
an important design feature included in such systems. Figure 
5.4 shows an example of this computation model and its 


adaptation. 


The output element of each line carries out the compar- 
ison procedure. These elements are denoted by "*" in the 
Figure 5.4 

This type of a pipeline computation can be used in the 


real time systems to compute some time dependent events. For 
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OuTPUT 


Poe eee 


Figure 5.4 Linear Type Computation Model 


example radar information would be input, second and third 
computing elements can calculate the actual target coordi- 
nates and weapon system outputs. These values can be 
compared by the last elements of the model to decide correct 
results. 

Another computation model could be a loop type of compu- 
tation. Also triple or quadruple modular redundancy can be 
applied to that model using three or four computation loops 
in the multitransputer system. The computation output 
elements must be assigned as decision elements of the redun- 


dant system, as in Figure 5.5 


This type of computational model can also be used in 
recursive calculations such as discrete signal analysis. 
Also triple or quadruple voting process will provide correct 


results at the end of each discrete period. 
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Fictune (39 Loop Type Computation Model 


A star type of computational model can be applied into 
the multitransputer system also. AS in previous applica- 
tions, triple or quac -uple modular redundancy is provided by 
three or four computation groups in the system. An example 


of that kind a system is shown on Figure 5.6 


That type of modular design may be used to compute three 
Stage computations. For example the first two computing 
elements can be used to evaluate both Search and track radar 
input for a certain time period and in the next stage target 
evaluations can be done, and final stage can be used to 
demonstrate these computations. 

Another combination of modular designs also can _ be 
applied to the multitransputer system using previous basic 


modular structures. 
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Figure 5.6 Star Type Computation Model 
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VI. CONCLUSION 


A. SUMMARY 


This thesis 1s focused on introducing the 
multitransputer system and its DBbualaine blocks: the 
Transputer T424 and OCCAM programming language. Also the 


role of the fault tolerance in a real time system, is empha- 
Sised in the implementation of the multitransputer operating 
Sys Gell 

The serial communication subsystem On the 
multitransputer operating system 1S implemented using the 
OCCAM programming language in the VAX 11/780 computer 
system. 

Performance of the serial communication subsystem is 
evaluated and compared with a fault free system. As evalu- 
ated in Chapter V, the proposed multitransputer system 
performance is highly capable to achiéve many presentgeeam 
time applications and a good candidate for future applica- 
tions. This system becomes more attractive with its fault 
tolerance capability for many real time applications, espe- 
clally in military based applications 

As far as the system dependability is concerned, the 
serial communication construct will provide avery high 
degree of probability of system functionality even if the 
parallel bus fails. Graceful degradation will be provided 
by sixteen computing elements in the multitransputer system. 
Also a common database application is possible during 
Parallel epis taiure- 

The fault free operation 1S improved using triple or 
quadruple voting mechanisms which provides fault tolerance 


for many real time applications. This feature of the 
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multitransputer system can be also provided in the parallel 
bus failure case. In other words, the serial communication 
system has the capability of triple or quadruple voting 


mechanism applications. 


B. FOLLOW-ON WORK 


Goes thesis Saddmeccsedsonly “the ~serial  communlecation 
subsystem for the fault tolerant multitransputer operating 
system. POSSIOMeemCcOnErnlatlomm or. Chis work “include other 
subsystems of the proposed operating system, or the formali- 
zation of the serial communication subsystem using the ring 


@ammunication structures. 
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